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METHOD OF IDENTIFYING THE SOURCE IF 
GENETIC INFOFMAT ION IN DNA 

FIELD OF THE INVENTION 

The present invention relates t :■ a method fcr embedding 
watermark information in DNA to identify the source of 
genet ic :nf irmat i on prov ided for the EUA . 

BACKGROUND OF THE INVENTION 

Of the various plants and animals found on earth, there are 
organisms such as soy beans, which have acquired a natural 
resistance to noxious insects, that possess qualifies that 
may be considered superior when compared with those of 
others of the same species. Further, there are organisms 
such as racehorses, valued as the offspring of good breeding 
stock, for which worth is assigned based cn an artificial 
evaluation reference. When these properties ano values are 
rated and levels are assigned :■: the genes that produie 
them, the genes are credited with providing an added value , 
as being "value-added genes". And even today, such 

so-called value-added genes are being traded for money. Fir 
example, an organism credited with having one of these 
value-added genes normally fetches a higher price than does 
another that is not si credited. 
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While value-added genes may be produced as a result of 
natural selection, in mcsc cases tcday, artificial, 
intentional manipulation is employed to develop or generate 
such genes. And it is anticipated that as the development 
of life sciences continues, the intentional production, 
through artificial manipulation, of value-added genes (and 
of organisms in which value-added genes are dominant) can 
only increase . 

In this case, for economical reasons a producer who has 
developed a value-added gene may not wish that it be freely 
available for third party use. For example, a producer 
holding one original genetic information for a value-added 
gene may permit a third party to use the gene under 
conditions whereby its empli yrren t is limited to a single 
generation, i.e., a condition whereby copying, including 
breeding or cultivating or copyina at the DMA 
(deoxyribonucleic acid) or the FNA (ribonucleic acid) level, 
is inhibited. 

However, the copying of plants :■ t: animals using genet:;-.: 
information can fce performed by gathering s per mat :>z oa or 
seeds, without highly technical or expensive apparatuses 
being required. Further, when bioengineer mg techniques are 
employed, the high-level copying of genetic information can 
be performed at the DMA or the RNA level. 

As is described above, the genetic information carried ty 
plants and animals can be copied without using highly 
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technical or expensive apparatuses, and by using the 
techniques embodied in bioengmeering, the high-level 
copying of genetic information at the DNA or F:NA level can 
also be performed. 

Therefore, it is very difficult to apply technical 
restrictions to the copying, by third parties, of the above 
described value-added genes. 

Further, when a value-added gene ijenerateo by a 
predetermined producer is detected in a specific organism, 
it is difficult to determine whether the gene was illegally 
copied, because it is hard to distinguish copying from gene 
mutation . 

If predetermined information, such as ID information, can be 
embedded in the DNA nucleotide sequence of a specific 
value-added gene, when the value-added gene is copied the ID 
information will also be copied. Therefore, when an 

examination is made to decide whether the DNA nucleotide 
sequence of an organism having a value-added gene includes 
ID information, a determination can be made as to whether 
the value-added gene was obtained by copying. 

It is, therefore, one object of the present invention to 
embed predetermined information in the nucleotide sequence 
of DNA and to identify the source of the genetic information 
in DIJA. 

It is another object of the present invention to detect and 



3 



JP920000C6 3US1 - CPA 
Kashina et al 



analyze information that is intentionally embedoe :1 in the 
sequence of nucleotides making up DMA and to determine 
w.nether a predetermined gene owned by a preoe: ermined 
organism is a copy of a specific gene. 

It is an additional -object of the present invention to 
provide means that can determine whether a predetermined 
gene owned by a predetermined organism is a copy of a 
specific gene, and to thus prevent the illegal :;p/i:;g c: 
the specific gene by a third party. 



SUMMARY OF THE INVENTION 

To achieve these objects, acooraing to the present 
invention, the following method for writing i nf ormao i ; n in 
DNA is provided. Specifically, a method fcr writing 

information in DNA comprises the steps of: correlating the 
pattern of a nucleotide sequence, which normally does not 
appear in a portion ot the ON A o»tner than a gen-, with 
identification information for identifying a source of 
predetermined genetic information belonging to the OKA; and 
embedding, in the portion of the DNA other than the gene, 
toe nucleotide sequence that is correlated with the 
identification inf or mat i in . 

When the pattern of the nucleotide sequence does not 
normally appear in a portion other than a gene, it means 
that it is stochastically ensured that under normal 
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conditions this pattern is not present in a portion of DIIA 
other than a gene. This prol ability can be calculated by a 
star istic process using frequency dist oibut ion . 

According to the present invent bin, a method is provided for 
writing information in the gene portion of a DMA molecule, 
instead of in the other portion. Specifically, the method 
for writing information in DMA comprises the steps of: 
correlating the pattern of a nucleotide sequence, which 
normally does not appear in the intron of a o\A, with 
identification information for identifying the s-:ur::e :f 
predetermined genetic information owned by the PNA; and 
embedding, m the intron of the DNA , the nucleotide sequence 
that is correlated with the identification inf ormaticn . 
When the pattern of the nucleotide sequence does not 
normally appear in the intron, it, means that it ;s 
stochastically ensured that under normal conditions this 
pattern is not present in the intron of DMA. This 
probability can be calculated by employing a statistic 
process for which frequency distribution is usee. 

Further, according to the present invent i in, a method for 
writing information in an exon of DMA, including genetic 
information. That is, the method fir writing information in 
DMA comprises the steps of: employing redundancy fcr a codo-n 
to be translated into ammo aoii so that multiple colons to 
be translated into- the same amino acid are correlated with 
binary data; and arranging, in the exon of a gene, the 
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codons that are correlated with the binary data, and to thus 
form a data sequence representing predetermined ir.fcrmaticn. 
With this configuration, the genetic information and the 
binary data are multiplexed and includec in an array of 
code ns . 

As for codon redundancy, even when the action of codons, 
relative to the kinds of amino acid into which the co-dons 
are to be translated, is che same, the use of codons varies, 
depending cn the species of an :rganisr\, and is normally 
erased. Therefore, in order to restrict: the influence on 
the organism as much as possible, it is preferable that 
frequently employed codons be selected for a targeted 
organism; fir information embedding, and that they be 
correlated with binary -data. 

Further, according to the presen: invention, a method 
provided for employing the information thus inserted into 
DMA to identify the source of genetic information in DMA 
that has been obtained from a predetermined organism. 
Specifically, this method comprises the steps of: obtaining 
DMA from an arbitrary organism of the same species as an 
organism wherein a source icient : f ioacion nucleotide 
sequence, for designating the source of genetic information, 
is embedded into the DMA; and employing as the source 
identi f ication nucleot ide sequence a complementary 
nucleotide sequence in order to determine whether the source 
identification nucleotide sequence is present in the DMA of 
the arbitrary organism. 
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Furthermore, according to the present invention, a DNA is 
provided to which information is added t v the above 
information writing method. Specif ical Ly, this DMA 

comprises: a gene portion including genetic information; and 
a portion, other than tne gene portion, including no genetic 
information, wherein the portion other than the gene ocrtion 
includes a nucleotide sequence that is correlated with 
source identification information and specifies a source :f 
genetic information that is transmitted fc y the ^ene portion. 

The gene portion that includes genetic information also 
includes exon that is translated into amino acid when 
protein is to be synthesized, and intron to. at is removed 
when protein is to be synthesized, and the intron includes a 
nucleotide sequence that is correlated with source 
identification information for designating a source of 
genetic information that is included in the exon. 

DNA includes multiple kinds of croons that are correlate:; 
with the binary data using the ccdon redundancy and are 
translated into amino arid, and binary data are used to 
correlate the codon array in the gene portion with a data 
sequence that represents predetermi ned inf orma t io n . 

DNA is provided wherein a special sequence that is 
intentionally designed is included as a part of a nucleotide 
sequence, wherein the special sequence is correlated with 
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source identification information fcr designating the source 
of genetic information included in the DMA, and wherein the 
special sequence is embedded in the DMA so as not to affect 
the transmission of the genetic information included in the 
DMA. 

For the DMA to which these data are added, multiple special 
sequences (nucleotide sequences correlated with information) 
are repetitively embedded, or multiple kinds of special 
sequences are embedded, m portions :>t.ner than ohe gene 
portions or in corresponding locations, such as introns and 
exons . 

Since multiple special sequences, or multiple kinds of 
special sequences are embedded, the probability that a 
special sequence will be naturally destroyed :>r will be 
naturally generated though the mating process can he 
reduced . 

In addition, the present invention can be provided as a 
nucleotide sequence that is designed to add information to 
DMA, or tne cell of an organism that mclides DMA to which 
information has been added. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a diagram for explaining the process for 
synthesizing genes in a DMA to obtain protein. 
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Fig. 2 is a flowchart for explaining the general processing 
according to this invention used tc determine a watermar.< 
sequence, by embedding it in a DNA and detecting it therein. 

Fig. 3 is a diagram for explaining the concept of a method 
for inserting a watermark sequence according to a first 
embodiment of the present invention. 

Fig. 4 is a flowchart showing the processing according to 
the first embodiment used for calculating an appearance 
probability of a sequence based on DMA sequence data, ami 
for determining a proposed watermark sequence. 

Fig. 5 is a diagram showing an example frequency 
distribution, using pseudo data, for a nucleotide sequence 
having six bases in one organism. 

Fig. 6 is a diagram showing an example frequency 
distribution graph of the number of organisms relative to 
the appearance frequency of a nucleotide sequence AAAGTC in 
Fig. 5. 

Fig. 7 is a flowchart showing the processing for confirming 
the safety of a watermark sequence according to the first 
embodiment . 

Fig. & is a diagram showing the state wherein a watermark 
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sequence is detected by using a complementary nucleotiae 
sequence . 

Fig. 9 is a diagram shewing the state wherein the nucleotide 
sequence of a DNA is read using a sequencer, and a wate::mar.< 
sequence is detected. 

Fig. 10 is a diagram for explaining the concept of a method 
for inserting a watermark sequence in accordance with a 
second embodiment of the present invention. 

Fig. 11 is a diagram for explaining the concept of a method 
for inserting a watermark sequence in accordance with a 
third embodiment of the present invention. 

Fig. 12 is a table showing the toleration for the first to 
the third embodiments relative to the individual copying 
methods . 

Fig. 13 is a table showing codons and corresponding amino 
acids (or special meanings). 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The preferred embodiments of the present invention will now 
be described in detail while referring to the accompanying 
drawings . 
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First, an overview of the present invention wil... to given. 
According to the present invention, a nucleoo :..de sequence 
carrying predetermined information, such as ID information, 
is embedded in DNA, so that DKA including such a nucleotide 
sequence is distinguishable. Thereafter, the nucleotide 
sequence carrying this predetermined information is called a 
watermark sequence, and the information represented oy the 
watermark sequence is called watermark information. 
When this watermark sequence is embedded in DNA including a 
value-added gene that is provided by selective breeding or 
tnrough gene manipulation, ir the value-added is copied 

during one breeding process by employing the :us other 

methods, the source of the genetic information in the gene 
can be identified. And if ohe watermark sequence is 
detected in the DNA of a predetermined organism, :t can he 
ascertained chat the gene if the organism is a copy of the 
DNA wherein the watermark was previously embedded, and is 
not one that is naturally generated through gene mutation. 
With this watermarking method, even when a value-added gene 
is copied, it can be determined whether the e-pying was 
performed legally or illegally. 

The specific procedures performed when embedding the 
watermark sequence are as follows. 

(1) A watermark sequence W is embedded in the DNA of a germ 
cell, such as a spermatozoon, an cvum cr a zygote carrying 
superior genetic information I (where W represents a 
spermatozoon and A represents an ovum) . 
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(2) A is fertilized, grows and becomes an ima-x> A'. 

(3) Imago A' copies it own genetic inf ormat ion to f : rm B (a 
spermatozoon or an ovum) . 

(4) B is fertilized, grows and becomes an imago B 1 . 

(5) When the watermark sequence W is detected in the DMA of 
imago B ! , it can be determined that image* B' includes a cooy 
of the genetic information I. 

As the copying of the gene is repeated during mating or as 
an extended time elapses, the genetic information 1 is 
degraded. Therefore, when the genetic information i is so 
degraded that the watermark sequence W can not be detected 
in the DMA, it :an be ascertained that the value of. the 
genetic information 1 has also been degraded. 

According to the invention, recognition of watermarked DNA 
is possible only when a watermark sequence is present. That 
is^ it is not anticipated that watermark information in a 
watermark sequence will have a specific effect on and alter 
an organism that includes the DNA in question. Therefore, 
the present invention can be employed for all species, 
including plants and animals. 

The form of trie genetic information ir. a cell will now oe 
described through an explanation of the overview of a 
process by which gene codes for a protein molecule. 

1 is a diagram for explaining a prccess fcr 
synthesizing a gene in DNA to :btain prttein. 

Arranged in the DNA are four bases, A (adenine i, T 



JF920Q00069US1 - CPA 
Kashima et al 



(thymine), G (guanine- and C (cytosine). This sequence :f 
the four bases (hereinafter the bases are referred to ty 
their initials, A, T, G and C) of DNA consists of a gene 
portion wherein a protein code sequence and its 
transcription control information are stored, and -a porti:n 
wherein genetic information is not included. As is shown in 
Fig. 1, thrvugh the transcription process employed for the 
synthesizat ion of protein, only the gene portion pertinent 
to= the protein is transcribed as an intermediate genetic 
material called tRNA. In tne case of a higher organism, the 
mRNA consists of axon, which is finally translate':! into- 
ammo acid, and mtron, which is removed during the process 

(this state is called the primary mRNA ; . Finally, the 
intron is removed (splicing), and the final rrFMA (mature 
mPMA) is obtained. The final mRMA is then translated and 
coded for protein. 

Now, a metho'd for copying the genetic information will be 
explained . 

Technically different methods for copying genetic 
:n formation can be employed in accordance with the physical 
storage of genetic information. For exairple, if the genetic 
in f formation is coded unchanged with the DNA fern, r=n 
inexpensive and easy meth:d, such = s breeding or 
cultivating, or a method for extracting one region from the 
DNA, including the gene, can be employed. Further, a method 
for copying the genetic inf orrr.at io»n from an ether state, such 
as RNA, or a method for reading a sequence of genes and 
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synthesizing them, can als< he employed. 

To determine the source of genetic information using the 
watermark information, the watermark information should be 
aole to tolerate the copying process performed by the 
various methods mentioned above. When the watermark 

information can tolerate the copying, it means that the 
watermark sequence can be maintained even after the gene is 
copied, and can thus be detected. As is described above, 
since various technically different methods can be employed 
when copying genetic mf orma t i on , these copying methods 
should oe taken into consideration for the embedding of a 
watermark. sequence in D!JA, in order to pro-vide copv 
toleration for the watermark information. In the present 
invention, the following three methods are pro-posed white 
also taking into account the safety of the watermark 
sequence, which will be describee later: 

a) a method for inserting a watermark sequence t: the 
portion of the DNA other than the gene portion; 
b': a method for insert mo a watermark, sequence into; the 
intron of the gene to be protected; and 

c) a method for embedding watermark information by using the 
codon redundancy . 

A further description will be given here of the method by 
which codon redundancy is used to embed watermark 
inf or mat i : n . 

As is described above, the DNA consists of the four bases A, 
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C, G and T. When the DNA is transcribed into FNA, thymine 
(T) is replaced by uracil (hereinafter referred to as U), 
and thus, RMA consists cf a sequence of the foui bases A, Z, 
G and U. For the transformation of the bases into ammo 
acid, codon consisting of a set of three of the bases A, o, 
G and U is employed as a unit. 



Fig. 12 is a table showing codons arid corresponding ami no- 
acids (or special definitions). 

In the table in Fig . 13 (hereinafter referred to as a codon 
table), codons are arranged in the left columns, and amino- 
acids for special definitions are arranged in toe riijh*: 
columns and are represented by their <;i): reviati oos: 
phenylalanine (Phe), leucine (Leu), serine (Sen f tyrosine 
(Tyr) , cystine (Cys), tryptophan (Trp), proline (Pro), 
histiline (His?, glutamine (Gin), Argmine (Arg), isoleucino 
(lie) , methionine (Met) , threonine (Thr) , asparagine (Asn; , 
lysine (Lys), valine <7al), alanine (Ala), aspartic acid 
lAsp), glutamic acid (Glu) and glycine t 31y; . Note that 
"termination" indicates that the process for the translation 
of a codon into amino acid is terminated. 



As is apparent from the oodon table, codon does no: have- ; : 
one-to-one correspondence with amino acid, and there are 
multiple codons that can be translated into one amino acid. 
This redundancy means that even when a sequence "differs at 
the RNA level (or in the DNA tefore transcription), at the 
final amino acid level the same material is obtained ty 
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synthesization . 

Since for an organism all that is necessary is that amino 
acid be correctly generated by synthesization, at the DNA ■: r 
PNA level an arbitrary codon within this redundancy range 
can be selected. When this fact is employed and a codon 
representing a predetermined gene is intentionally selected, 
watermark information can be written. 

An, explanation will now be given for a condition that 
permits a nucleotide sequence to be employed as a watermark 
sequence. In order to emoed a predetermined nucleotide 
sequence in DNA as a watermark sequence, the nucleotide 
sequence must be safe and must serve as a probative force. 
When the nucleotide sequence is safe, it means that the 
nucleotide sequence is n:>t significant as a source form, 
i.e., an organism is not affected by the insertion in DNA of 
a watermark sequence. The procedures for c:r.firmina the 
safety of a food, for example, must be corfcrme:; in 
accordance with general standards, sucn as tr.e "Guideline 
for evaluation of the safety of recombinant DMA techniques 
for foods and food additives" established by the Ministry of 
Health and Welfare. 

Furthermore, while taking safety into acccunt, the position 
whereat the watermark sequence can be inserted is limited to 
a portion of the UNA that is not biologically significant. 
Therefore, as is described above, a portion of DNA other 
than a gene, or the intron of a gene, is selected as the 
portion wherein the watermark sequence is embedded in DNA. 
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It should be noted that when the coder. redundancy is 
employed, safety is ensured, so that watermark information 
can be embedded in an exor. that is biologi c-j.: 1 v sioni f leant . 

When the nucleotide sequence includes probative force, it 
means that it is guaranteed that detection :f the watermark 
sequence indrcat.es that copying was perfumed. That is, a 
sequence that corresponds to the watermark sequence should 
not originally be present in DMA, or should not occur 
naturally due to a slight change in tne DMA. To implement 
this probative force, it must be stochastically guaranteed 
that the same sequence as the watermark sequence does not 
appear naturally. Therefore, the s;L::e of the watermark 
sequence, the number of watermark, sequences, the type of tne 
watermark sequence and the insertion location must re taken 
into consideration. The setting of these parameters will be 
speci f ically described later . 

When the codon redundancy is employed to embed watermark 
information, a rare combination of codons should be emc loved 
to write the watermark information. This can prove that the 
sequence was not; coincidentally inserted into a gene, but 
was intentionally inserted in the gene as watermark, 
information . 

Fig. 2 is a flowchart for explaining the general or:cessinu 
used for the determination of a watermark sequence, its 
embedding in a FNA and its detection therein. 

In Fig. 2, a watermark sequence is determined based on DNA 
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sequence data (step 2 01) . The watermark sequence 

determination method will be described later. Then, tne 
watermark sequence is embeoded in the DNA if an obje-rt 
organism, (step 202). Following this, the safety ■: f the CNA 
in which the watermark sequence is embedded is examined 
(step 203). When the safety of the DNA is confirmed, the 
organism including the DNA is produced, and the DNA is 
copied 'step 204). Thereafter, the watermark sequence is 
detected as needed in the DNA -if an organism if the same 
species, and the source of the DNA information carried by 
the organism is identified (step: 205?. 



A technique that is similar :c this invcriticn, fir 
preventing the illegal copying of a value-added gene, is 
disclosed in DS patent publication 5,723,765. This 
technique prevents germination of the seed at the second 
generation by gene manipulation. Since the seeds cathered 
from crops that are manipulated using this technique are not. 
germinated, and producers must, b^y seeds evert year, tne 
profits of seed/seedling developing companies can be 
p rotected . 

According to this technique, when seeds or seedlings of 
trops grown through the normal growth process of 
germination, blooming and pollination mature beyond the 
cirmant stage and reach the growth point at: which a se:on:l 
generation :>r leaf buds are to be developed in the seeds, 
protein containing a toxin, which is generated by a toxic 
gene, is recomtined in a gene and kills the seeds. 
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However, with this technique, the dispersion of a toxic gene 
that kills embryo buds, the affect on the holy of a human 
who ingests the toxic protein, especially as they are 
related to allergic reactions, or the affect: on biro's and 
insects that eat these seeds and on n icrcorgamsms , such as 
molds and viruses, are unknown. In the present invention, 
as well as this technique, a nucleotide sequence having an 
unknown function is embedded in the D'JA; however, at tne 
least, no nucleotide sequences that, it is knc v;n apparently 
generate a toxi: material are not embedded. 

IN addition, in order to control the tirr.e for the production 
of a toxic protein, the above technique employs a promoter 
that is activated when an embryo- is developed. But since 
for the present invention, a funttion that depends on an 
organism: is not required, the present invention can lie 
easily employed for a variety of organisms. 

As is described above, in this invention that is taking into 
account the copy toleration and tne safety of the genetic: 
information, the method used to insert a waterrrark sequence 
in a portion of a E1]A other than the gene, the method for 
inserting a watermark sequence into the intron of the gene, 
and the method for employing the oodori redundancy to embed 
watermark information have been proposed as methods for 
embedding a watermark sequence in the bNA. Since these 
watermark sequences differ in form (the insertion location, 
the sequence size, etc.), conditions required for an 
insertable watermark sequence vary, and copy toleration and 
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ease of carrying out the method als:> differ. 

The preferred embodiments for the respective methods will 
now be described while referring tc the accompanying 
drawings . 

First Embodiment 

First, the embodiment employing the method for the insertion 
of a watermark sequence in the portion of a DMA other than 
the gene will be described. 

In this embodiment, so loner as a watermark sequence is 
detected in DMA, the source of the genetic information for 
the DMA can be specified. Therefore, the watermark sequence 
can be inserted at any location at random, and even if it is 
inserted in the gene portion of the DMA, the function of the 
watermark sequence is not lost. However, since a watermark 
sequence that is unrelated to the genetic information for an 
organism is inserted into a gene portion, the organism may 
be affected in some way (excludes other embodiment: s that 
will be described later, a method for inserting a watermark 
sequence in intron, and a method for employing the codon 
redundancy to embed watermark information) . Therefore, the 
watermark sequence must be inserted into a portion of the 
DNA other than the gene portion. 

Fi g- 3 is a diagram for explaining the concept of the 
watermark sequence insertion method according to this 
embodiment . 
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For this emb cdiment , an explanation will n :>w be given for 
the individual steps shewn in Fig. 2, i . e . , (1) 
determination of a watermark sequence, f2) en tedding a 
watermark sequence in DNA, =3) confirmation -of safety, (4) 
detection of a watermark sequence, and (5) toleration :>f a 
watermark sequence . 

(1) Determination of a watermark sequence 

A nucleotide sequence usable as a watermark sequence is 
determined. As is described above, this nucleotide sequence 
is a sequence (hereinafter referred to as, for example, a 
sequence that normally does not appear) having a pattern 
that normal. Ly dees not appear in the DNA of a tarcjet 
organism foi the emoeodmg of the watermark sequence. 
Specifically, the nucleotide sequence is determines as 
follows . 

Assume that the total number of oases in the DNA of an 
object organism is defined as N and a watermark sequence 
having bases n is embedded m the DNA. Since there are fcur 
types of bases (A, T, G and C), only a proposed watermark 
sequence WM having bases n that satisfy a condition must be 
selected from a set S cf 4 : choices. That is, 

WM choices e S(n) element count |S(n)| - 4 : ' 



When V(n) denotes a sequence that is especially significant 
for a sequence that has a hich probability of appearing in 
normal DNA, only the watermark sequence WM must be selected 
from S (n) - V (n) . That is, 
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watermark sequence WM actually employed e S(n) - Vfn). 
Since the number of elements in Sm), together with n, ;s 
increased in accordance with the exponential function, a 
sequence that does not normally appear can be found, ev^-n 
through it has a shore length. 

Assume that a watermark sequence havi.no a length n is to te 
embedded in the DNA of a human (hereinafter referred to as 
human DNA). Since the length of the human DNA has abc a: 10 
billion bases, about 30 billion partial sequences having the 
length n are arranged in the human DNA . Supposing that 
these partial sequences are arranged evenly, there are -h 
different expressions for the watermark sequences of the 
length n. Thus, for a partial sequence of about SO bases, a 
number of base types greatly ex:ee iin: 30 billions tan no 
obtained. Therefore, when even a rudedde sequence r. 
roughly 30 bases is employed, satisfactory sequences can d 
obtained that do not normally appear. Among these 

sequences, a restrictive enzyme identification sequence, or 
a sequence such as a promoter that does not include 
biological meaning can be selected as a prop; seel watermarn 
sequence . 

In actuality, arbitrary partial sequences m the DNA are 
biased. However, at present a sequence determination for 
the DNA of several species has been completed and it is 
forecast that the nucleotide sequence ;f the DMA will be 
gradually explicated for all organisms, including human 
beings. Based on all the nucleotide sequences, the 

distribution of the partial sequences tan actually be 
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understood, even though only approximately. 

Fig. 4 is a flowchart showing the processing for calculating 
the appearance probability based on DNA sequence data and 
for determining a proposed watermark sequence. 

In Fig. 4, the threshold value of a probability is set to 
guarantee that a nucleotide sequence selected as a watermark 
sequence is a sequence that does not normally apoear in the 
DNA of an object organism (step 401). Then, the DNA 
sequence data are employed to calculate the probability 
whereat a predetermined nucleotide sequence will appear in 
the DNA (step 402). When the probability is smaller than 
the threshold value at step 401, the pertinent sequence is 
defined as a proposed watermark sequence (step 403). 
If the overall nucleotide sequences in the DMA are well 
known, the probability that the sequence does not normally 
appear can be calculated approximately, oaseo cn these 
sequences . 

For the probability calculation method to guarantee that a 
predetermined nucleotide sequence ooes not normally appear 
in the DNA, the process for determining a watermark sequence 
having a length of six bases will be described by using 
simple pseudo data. 

First, to determine the proposed watermark sequences, in one 
organism, a frequency distribution of nucleotide sequences 
having a length of six bases is employed. Fig. 5 is a 
diagram showing an example frequency distribution for the 



JP 92 0 0 0 0 0 6 ": ! U31 - CFA 
Kasnima et al 

nucleotide sequences. Assume that AAA3TC is selected as a 
proposed watermark sequence. Since the frequency of AAAGTC 
is three, if AAAGTC sequences of more than three are 
embedded to the Dh'A, the nucleotide sequeme AAAGTC can he 
employed as a watermark sequence. 

However, when the organism in which the watermark sequence 
being embedded is mated with an organism with no watermark 
sequence, the number of watermark sequences in an organism 
obtained by one mating is reduced about half because of 
meiosis. To avoid this phenomenon, multiple watermark 
sequences must be embedded in the CNA. Further, dee t r ict .r: n 
of a watermark sequence due to gene mutation, or coincident: 
generation of the same nucleotide sequence as the watermark, 
sequence must be taken into account. Therefore, the number 
of watermark sequences should be determined while taking 
into account the fact that the frequency of the appearance 
of the nucleotide sequences differs in organisms due to gene 
mutation, etc . 

As the method for taking into account a difference in the 
frequencies of the appearance of the nucleotide sequences 
among organisms, DMA sequence data are collected for as many 
organisms as possible, an:; the number cf organisms for each 
frequency cf the appearance of the nucleotide sequences can 
be employed as a frequency distribution table. Fig. 6 is a 
graph showing a frequency distribution tatle for the 
frequency of the appearance of the number of :rganisms 
relative to the nucleotide sequence AAAGTC. 
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In samples taken from 10 organisms in Fig. r., O no- type :f 
organism contains six or more AAAGTC sequences, 8.30 of the 
tonal. Therefore, when AAAGTC is employe:; as a watermark 
sequence ana when six or more sequences are embedded in toe 
DKA of one organism, from the distribution of pseuO:) data in 
Fig. 6, the watermark sequence will be detected :n 8.30 of 
organisms wherein the watermark sequences were not embedded. 

In this case, it can be understood that the nucleotide 
sequence AAAGTG functions as a watermark sequence with an 
err or rate of 6.30. 

Further, when one kind :r multiple kinds of sequences are 
embedded in multiple locations, the probability whereat the 
same nucleotide sequence as the watermark sequence will 
occur due to gene mutation, and the probability whereat the 
watermark sequence will be destroyed can be reduoed. For 
example, when many of one kmj of watermark sequences are 
embedded, the probability that all :he watermark sequences, 
equivalent to the number of detected organisms, will be 
changed is very low. 



The thus obtained probability is employed to determine a 
watermark sequence that can satisfy a requested probability. 

An explanation will be given, using pseudo data ana with a 
protection period of 10 years, for a case wherein the source 
of a value-added gene, which was intentionally generated, is 
specified by using the watermark sequence in c rder to 
prevent the illegal employment of the value-added gene. 
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Assume that an organism is the same species as an organism 
in which the watermark sequence is to be emoedded, and that 
an estimated 1000 organisms will be present during the 
protection period of 10 years. 

As is apparent from the frequency distribution in Fig. 6, 
8.3% is the probability (error rate) when six nucleotide 
sequences AAAGTC are embedded in the DNA and when six or 
more of the nucleotide sequences are detected m an organism 
wherein the pertinent sequence is not intentionally 
embedded. Similarly, 0.02% is the error rate when ten 
nucleotide sequences AAAGGT are embedded, and 0.001% is the 
error rate when eight nucleotide sequences AAAGGG are 
embedded . 

As watermark sequences, six nucleotide sequences AAAGTC, ten 
nucleotide sequences AAAGGT and eight nucleotide sequences 
AAAGGG are embedded in the E'NA of a specific organism, and 
the probability is calculated as an independent phenomenon, 
in this case, for an organism other than that wherein these 
watermark sequences are intentionally embedded, the 
probability that all of these nucleotide sequences will be 
found at a frequency higher than a given frequency is 

8.3 x 0.02 < 0.001 = 0.0001*5*: (%) . 
Therefore, it can be said that, of the total of 1000 
organisms that will be present during the protection period 
of 10 years, an organism that coincidentally has the same 
sequence will rarely be encountered. 

When a great number of watermark sequences are embedded, a 
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target DNA :an be divided into segments by a restriction 
enzyme and these segments can be detected by using a DNA 
chip, so that the number of embedded watermark sequences can 
roughly be obtained. Therefore, a method can be employed 
whereby, if statistically the number of embeddec watermark 
sequences is significantly large, it can be ascertained th:tt 
the watermark sequence has been inserted. 

When multiple kinds of nucleotide sequences are to be 
inserted into a DNA as watermark sequences, the amount of 
information to be added to the DNA can be increased by 
managing the combination of watermark sequences. 

(2) Embedding a watermark sequence in LKA 

Using a vector, the watermark sequence can be comparatively 
easily embedded in the DNA. 

However, according to this method, the watermark sequence is 
inserted at random at locations in the DNA . Since the 
embedding location can not be designated, the watermark 
sequence may be inserted into a gene portion rather than 
into a targeted portion other than a gene. Thus, the 
confirmation of safety, which will be described later, is 
indi spensable . 

(3) Confirmation of safety 

As is described above, when a vector is employed to embed a 
watermark sequence, the embedding location can not be 
designated, and the watermark sequence may be inserted into 
a gene portion. Therefore, the safety of an organism 
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wherein the watermark sequence has been embedded in DMA must 
be confirmed. For this pro:ess f whether the watermark 
sequence has been inserted inn- a portion cf the DNA other 
than the gene portion is not determined; however, in this 
embodiment, so long as the watermark sequence is detected in 
the DMA, the function of the watermark sequence can be 
demonstrated, regardless of its embedded location. As a 
result, the safety of the organism can be satisfactorily. 
The standard for safety should be determined in accordance 
with the function of the value-added gene that is to be 
protected (the illegal copying of which should be 
prevented). This requires a social agreement;, but if the 
value-added gene to be protected is socially approved, it is 
assumed that a watermark sequence providing the sam.e safety 
can also be approved. 

For the confirmation of safety, an organism should be used 
for the testing that is conducted. 

Fig. 7 i s a flowchart shiwmg the processing performed t i 
confirm the safety of a watermark sequence. In Fig. 7, of a 
number of proposed watermark sequences, a single arbitrary 
watermark sequence is selected (step 701). The selected 
watermark sequence is embedded in the DMA of a predetermined 
ireanism and the procedure is paused while the organism is 
■growing (step 702) . Then, tne safety if the organism that 
has grown is examined, and if the result is not 
satisfactory, another watermark sequence choice is selected 
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and process is repeated (step 703). If trie safety is 
confirmed, however, the pertinent sequence choice is 
determined to be a waternark sequence (step 7 ('"4). 

(4) Detection of watermark sequence 

A nucleotide sequence that is complementary tc an embedded 
watermark sequence can be employed tc detect a watermark 
sequence in DNA. Fig. 8 is a diagram shewing one state 
wherein a watermark sequence is detected usm:i one 
complementary nucleotide sequence. 

In Fig. 8, the watermark sequence TTTATTACA is embedded in 
DNA, and the nucleotide sequence AAATAATGT , which for this 
watermark sequence is complementary , is employee tc detect 
t h e watermark s e q u e n c e . 

Further, when the DMA to be searched for is extracted, and 
the nucleotide sequence of the DNA is read using the 
sequencer, of the watermark sequence is embecilec m the DNA 
it can be detected. Fig. 9 is a diagram showing the state 
wherein the nucleotide sequence of the DNA is read i: y using 
the sequencer, and the watermark sequence AAATAATGT is 
detected. 

(5) Toleration of a watermark sequence 

In order chat watermark information evidence copy 
toleration, the watermark sequence must be copied, and thus 
not be deteriorated, when the DNA is copied. In this 
embodiment, relative to the copying of all of the DNA due to 
breeding, copying due tc cell transplantation, copying due 
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to extraction of a chromosome, or copying specif 1 :;ally 
performed by removing a region in ohe CNA , to include- the 
watermark sequence, a watermark sequence that is inserted at 
random in DNA locations other than a gene can be copied when 
the DNA (or a nucleotide sequence, one part of the DMA) is 
copied, without being degraded. In other words, the 

toleration is maintained by the watermark inforrnat: :n. 
However, when the portion extracted of the DNA that, has been 
copied is so small that a watermark sequence is probabioy not 
inc iuded, the watermark. sequence is not copied to the 
nucleotide sequence copy. Therefore, for such copying, the 
watermark information in this embodiment does not exhibit 
toleration . 

When the protein code regicn is transcribed into mFNA in the 
process for synthesizing genes to obtain protein, a portion 
other than the gene portion is not present, so that the 
watermark sequence is not included. Thus, when the gene is 
copied in the mRNA state, m this embodiment the watermark 
information does not exhifc it: ti lerance . 

Second Embodiment 

An explanation will now he given for an embodiment that uses 
a method for the insertion of a watermark sequence into the 
intron of a gene to be protected. 

As is described above, for a higher organism the mRNA 
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cbtained by transcribing the protein code region from the 
E-NA consists of exons that are to te finally translated mt:> 
amin:. acid and nitron that is to be removed during the 
process (primary mRNA) . Therefore, while taking into 

account the affect of an organism into which a watermark 
sequence has been inserted, the watermark sequence can be 
emkeoded in the ir.tr :-n that is nor emoloveci fir the 
syn t res is of protein . 

According to the embodiment, since the watermark sequence is 
embedded in the gene portion of the DIIA, an advantage is 
that for protection the watermark sequence can itself be 
embedded into a value-added gene. 

Fig. 10 is a diagram for explaining the concept cf the 
method used for the insertion of a watermark sequence 
a c s o r d i n g t o t h o s e mk :.) d i me n t . 

For this emb id: men t , an explanation will now be oiven for 
the steps in Fig. 2, i.e., (1) the determination of a 
watermark sequence, (1) the emceddmg if a watermark 
sequence in DNA, (3) the confirmation of safety and ; 4) the 
detection of a watermark sequence, and (D) the toleration of 
a watermark sequence. 

fl) Determination of a watermark sequence 

A nucleotide sequence that can be employed as a watermark 
sequence is determined. Trus nucleotide sequence is one 
that does not normally appear in the intran of the gene 
portion in the DIIA of a targeted organism in which the 
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watermark sequence is to be embedded. The same method as is 
used in the first embodiment is employed to determine the 
watermark sequence. H:wever, while in the firs': emo :d:ni(mt 
a sequence that overall does not normally appear in the 
nucleotide sequences of the DUA is employed, in this 
embodiment, a sequence that does not normally appear in the 
intron of the gene is employed. 

As Described m the first embodiment, for a nucleotide 
sequence to be used as a watermark sequence, it must, not be 
biological] y significant . 

If the genetic sequence of the PNA of the target organism is 
already known, this genetic sequence can 1: e employed to 
calculate the approximate probability thao the sequence will 
mot normally appear in the intron. For calculation of this 
probability, the frequency distribution used in the first 
embodiment can be employed for the sequences in the introns 
that are collected from many organisms. 

Further, when one or several kinds cf watermark sequences 
are embedded in multiple introns, the probability whereat 
the same nucleotide sequence as the watermark sequence will 
occur due to gene mutation and the probability that the 
watermark sequence will be destroyed oan be reduced. When 
multiple kinds of nucleoside sequences are inserted, 
management for the combinations of these watermark sequences 
is provided, so that the amount of inf ormat io-n that is added 
to the FN A can be increased. 
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{?.) Embedding a watermark sequence in DMA 

In the process for synthesizing genes (mcludino the exon 
and the intron), when a nucleotide sequence that :s 
determined to be a watermark sequence is inserted into a 
desired Location in an intron, the watermark sequence can he 
embedded in the DNA . Preferably, the genes to be 

synthesized should be value-added genes that must to 
protected; however, they may be other genes. 

Furthermore, if a vector can i: e employed to specify the 
local ion of the intron in the gene and 10 embed the 
nucleotide sequence therein, the watermark sequence can z e 
en.bedded in the intron of a desired gene. 

in order to embed the watermark sequence using the method of 
this embodiment, it is necessary to find the .mtron porti:n 
m the gene. The splicing for the removal of the mtr:n 
from 2 «jene is effected by a soliceosomo. The reason for 
this is that it is known that the nucleotide sequence 
included in the spiioeosome is easily coupled with the 
nucleotide sequence of an intron start port ion . Thus, as 
one method, the nucleotide sequence included in the 
spiioeosome can be employed to designate tne intron portion 
of a gene. 

(3) Confirmation of safety 

As is described above, the intron is a portion removed by 
splicing when the gene is translated into ammo acid. 
However, since a watermark sequence that is not related to 
the genetic information for an organism is inserted into the 
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gene portion, according to this embodiment it is also 
necessary f:r the safety of the crganism in which the 
watermark sequence is embedded to be uneo. 
As in the first embodiment, the safety standard should be 
determined in accordance with the function ■:- f the 
value-added gene to be protected. 

Furthermore, for conf irmat i : n of the safety, an experiment 
using an organism must to conducted. The prcceoures for 
confirming the safety if the watermark science are 
performed m the same manner as in the first embodiment in 
Fig. 7. 

(4i Detection of watermark sequence 

As in the first embodiment, a method for employing a 
nucleotide sequence that for the embedded watermark sequence 
is complementary, or a method for employing a sequencer ti 
read the nucleotide sequence in the DMA can be employed to 
detect the watermark sequence. 

(5) Toleration of a watermark sequence 

As in the first embodiment, the toleration evidenced by the 
watermark information in this embodiment is relative to the 
copying of the entire DNA, the copying using cell 
transplantation, the copying by the extraction of a 
chromosome, or the copying especially performed by remivmg 
a region in the DNA that induces the watermark sequence. 
Even when the portion extracted from the DNA is copied, so- 
long as the gene is included in the portion, tne intron 
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portion will always be copied. Therefore, s-: Long as the 
copying is performed as gene units, the toleration evidenced 
by the watermark information in this embodiment is adequate. 

The same thing applies in a case wherein one PKA is copied 
in a state wherein the protein code region is transcribed 
into the mRNA during the synt hes i zat ion of oeros to obtain 
protein. 

However, for a higher organism, the intron portion is 
removed by splicing before the primary mP.ioA is translated 
into amino acid, and the watermark inf : matron in this 
embodiment does not possess the relative toleration for the 
copying from the mRNA after the splicing. 

Third Embodiment 

An explanation will now be given for an embodiment using the 
method for which codon redundancy is employed to embed 
watermark information . 

As is described above, tne nucleotide sequenoe of DMA is 
coded in amino acid using codon units composed of three 
characters. However, since 64 (= 4") different three base 
combinations can be formed for 20 kinds of amino- acids in an 
organism, and multiple coo on codes may be present for one 
type of amino acid, so that the watermark information can be 
embedded in this redundant portion. 

The correlation between the codon s to be translated into 
amino acid during the pr:tein synthesis process and the 
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amino acid is provided by the cod on table. By i of erring to 
the eodon table in Fig. 13, it is apparent that; multiple 
::dons are correlated with one amino acid, and that the 
redundancy is mainly located at the third character of the 
cc'don . Thus, within the range permitted according to the 
ccdon table, i.e., so long as the codons are on i elated with 
the same amine- acid, each codori in the exon of a gene to be 
protected can be freely replaced by another base. This 
degree of freedom is employed to embed the watermark 
information. Therefore, in this embodiment, the sequence of 
codons selected for the insertion of the watermark 
information serves as a watermar-: sequence. 

According to the present invention, an advantage is that the 
watermark information can be embedded in the exons m the 
gene portions of the DNA . 

Fig. 11 is a diagram for explaining the con::ept of the 
watermark sequence insertion method according to the 
embodiment . 

For this embodiment, an explanation will now be Given for 
the steps in Fig. 2, i.e., fl) the determination of a 
watermark sequence, .;2) the embedding a watermark sequence 
in a DNA, ( ) the confirmation of safety and (4) the 
detection of a watermark sequence, and (5) the toleration of 
a watermark sequence. 

il) Determination of a watermark sequence 

As is described above, there are multiple codes (cedens) cf 
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nucleotide sequences that correspond to one mine acid. 
Thus, when the eodens corresponding t-o a predetermined am^no 
acid are intentionally selected, additional ir.f : rm-ation can 
be embedded directly in the: gene, without changing the 
meaning of the sequence, which is the code of useful protein 
(an amino acid sequence that has been cidec). In this 
process, at the present time, the strict replacement, such 
as the replacement; of only a desired codon (the base in one 
part of the codon) in the DMA, seems tc, be technically 
difficult . 

However, instead of directly rewriting the nucleotide 
sequence in the DNA, when new protein is designed at the 
level of amino acid, or when the amino acid se^ionce that is 
coded using an exotic gene fc r the insertion is read and a 
corresponding DNA is designed, the watermark information can 
be embedded in the process f c r the replacement of the amino 
acid in the codon. 

It shoula be noted tnat the employment of the codons differs 
in accordance with the species of organisms, and is normally 
biased. Thus, when codons that are less frequently employed 
are used for a specific organism, there are rev; 
corresponding tRIIAs, so that the t rans c r ip 1 1 : n efficiency 
will be reduced and the expected function of pmteir. will be 
lowered . 

Therefore, a method is employed that uses th^ two codons 
whose appearance frequencies are the highest and the second 
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highest, and to write information that ccrre^tes the;e 
codecs with binary chit a ;0, 1). 

N codons are employed, and whether each ccdon (since there 
rs only one kind of ccdon corresponding to methionine, this 
excluded; corresponds to 0 or 1 is determined. Then, 
when the information is read, the binary oaca string 
(hereinafter referred to as a bit string) ha vino a length N 
is obtained. Thereafter, the watermark information is 
written using this bit s trine. 

This method will now be described more in detail. 
In order to affect the efficiency of the synthesis of 
protein as little as possible, the two codons whose 
appearance frequencies are the highest; and the second 
highest are selected for the amine. acids other than 
methionine, and are allocated values of "0" m "1." hi I zh- 
portions that can be usee for coding can be employed for the 
exens in genes in order to embed the information. Further, 
codons to be used and codons not to be used mav be 
distinguished by using a pseudo random key, and the same key 
may be employed for the ■detection of the extraction of 
information only from codons that are used for embedding. 

There are tw: orol.lems witn this embodiment. As one, a 
false positive err;r may occur whereby, in accordance witn 
the above described rule, some message will also be 
extracted from the gene of an organism m which no 
information has been embedded. That is, when a bit string 
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"1001" is extracted, there is no means for ascertaining 
whether the bit string was intentionally embedded 
information or a combination that occurred naturally. 
As another problem, a false negative error may occur whereby 
the rule of repetition may be destroyed because sortie 
mutation has occurred in the genes in which information has 
been embedded, and a bit string that represents :.he 
watermark inf <; rmat i en can not be detected. 

As one method for resolving the problem that is due to a 
false positive error, a method for respectively embedding a 
message can be employed. If the probabil lty whereat the 
repetition of the message is sufficiently lew for a gene in 
which no information has been embeoded, it can be 
ascertained that information has been embedded in a DNA in 
which the repetition of the bit string is detected. 
The probability of the 'Occurrence of a false positive error 
is obtained as follows. 

For simplifying the explanation, only one type of amino 
acid, A, is employed for coding. From among codons that; 
synthesize the amino acid A, assume that the cotion whose 
appearance frequency m an organism is the highest is 
defined as CO (employment pr:bability PC), and trie codon 
whose appearance f re quency is the second highest is define :1 
as Cl (employment probability PI). Further, assume that the 
bit "0" is allocated to CO and the bit "1" is allocated to 
Cl, and that the total number tJ of CO and Cl are included in 
the exon of a target gene for information embedding. In 
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this example, the watermark information chat is represented 
by a predetermined bit string consisting of CO and CI .is 
respectively embedded m times. In this case, information 
consisting of n bits (N = mn) can oe embedded. 

Under the above assumption, the probability that a false 
positive error will occur is represented by equation 1. 
[Equation 1] 

{false _positive_err or) = £ (*j\PQ) k (P\ ) f '~ k 

k 0 V a ^ W 



where in(^^ = 



n 



k) k\(n-k)\ h 



e 



n multiple kinds of amino acids are employed for coding, the 
frequency whereat each kind of ammo acid appears m an exon 
can be substituted into equation 1 to obtain the 
probabilit y . 

Furthermore, when s bits ( s < n ) of the n bits are employed 
for the message and the remaining (n-s) bits are employe i as 
an error correction sign, the probability of the occurrence 
of a false negative error can also fce reduceoi sons i lerabl y . 

In the above explanation, from among the codons that 
synthesize the amino acid A, the two codons CO and CI, whose 
appearance frequencies are the highest and the second 
highest, are assigned bits 0 and 1. However, when codons 
other than CO and CI are replaced with 0 and 1, the amount 
of information to be embedded can be increased. 
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( 2 ) Embedding a watermark sequence in FNA 

Wren, during the pr:»;ess :or the synthesi zaticn :. f genes, 
bases constituting a specific colon are appropriately 
selected and a bit string representing watermark information 
is prepared, the watermark information can be embedded in 
the DNA. Further, when the replacement" of each base or the 
replacement of the kases for each ccdin is performed as an 
extension of the gene synthesis technique, the watermark 
information can also be embedded in the DNA. 

(3) Confirmation of safety 

A gene in which the watermark information is embedded using 
the method of the embodiment synthesir.es the same or:>tei:i as 
the gene a target organism <: r lgina ^ 1 y included. however, 
since each codon in a gene is artificially rewritten within 
the range of the redundancy, it; is difficult to say there is 
no side effect affecting an organism. Therefore, also in 
this embodiment, the confirmation of safety is required for 
an organism in which the watermark information is embedded. 
As in the first embodiment, the safety standard shcule be 
determined in accordance with tne function of the 
value-added gene to be protected. 

For the confirmation of safety, an experiment using an 
organism should be conducted. The procedures f c r confirming 
the safety of a watermark sequence are performed in the same 
manner as in the first embodiment in Fig. 7. 

(4) Detection of a watermark sequence 
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As in the first embodiment, the method for employing a 
nucleotide sequence complements the embedded watermark 
sequence, or a method for empLoying a sequencer to read the 
nucleotide sequence in the DNA can be employed to detect: the 
wa z errr.ar k sequence . 

(5) Toleration of a watermark sequence 

As in the first and the second embodiments, the watermark 
information in this embodiment has a toleration relative to 
the copying of the overall DNA, the copying using cell 
transplantation, the copying using the extraction of a 
chromosome, or the copying that is especially performed by 
removing a region in the DNA that includes the watermark, 
sequence . 

In addition, as in the second embodiment, even when the 
portion extracted from the DNA is copied, so long as the 
gene is included in the portion, the intron portion is 
always copied. Therefore, so long as the copying is 
performed as units of genes, in this embodiment: the 
toleration of watermark, information is ensured. The same 
thing is applicable for a case wherein the DNA is copied in 
a state wherein the protein code region is transcribed into 
the mRNA during the synthesi za: ion of genes to obtain 
protein . 

Furthermore, in this embodiment, since watermark information 
is embedded in the exons of genes, the watermark information 
is also included in the mRNA that is finally translated into 
the amino acid. Thus, the watermark information in this 
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embodiment possesses a toleration that is a.lso relative t :> 
the copying of the mRNA after the splicing has been 
per formed . 

The watermark information that is embedded, in each of the 
embodiments, in the DMA in the above described manner can be 
employe:! as information to determine the source of genetic 
information, in accordance with the toleration attributable 
to the information. 

Fig. 12 is a table showing the toleration of the watermark 
sequence for the first, the second and tne third embodiments 
relative to the individual copying methods. 

In Fig. 12, all the watermark sequences for the first, the 

second and the third embodiments have toleration 
attributable to the mating. The watermark sequences .^n the 
second and the third embodiments have toleration relative to 
tne copying of the primary RNA . And the watermark sequence 
in the third embodiment has toleration relative to the 
copying from the mP.NA after the splicing. 

When the watermark sequence is detected and analyze:!, it can 
be confirmed that a value-added gene that is included with 
the watermark sequence in a gene is a copy of a specific 
gene. Further, when the right to produce or to copy this 
value-added gene is restricted by the establishment of a 
contract, or by another means, it can be determined whether 
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the copy of the gene is legal, and illegal copying can be 
prevented . 

As is described above, according to the present invention, 
since predetermined information is embedded in the 
nucleotide sequence of DMA, the source of the genetic 
information in DNA can be identified. 

Further, according to the present invention, since 
information that is intentionally embedded in the sequence 
of nucleotides making up DNA is detected and analyzed, it is 
possible to determine whether a predetermined gene owned by 
a predetermined organism is a copy of a specific gene. 

In addition, according to the present invention, since a 
check is performed to determine whether a predetermined gene 
owned by a predetermined organism is a copy cf a specific 
gene, the illegal copying of the specific gene by a third 
party can be prevented. 
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